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Abstract. In electronic sports, cyberathletes conceal their online train¬ 
ing using different avatars (virtual identities), allowing them not being 
recognized by the opponents they may face in future competitions. In 
this article, we propose a method to tackle this avatar aliases identifi¬ 
cation problem. Our method trains a classifier on behavioural data and 
processes the confusion matrix to output label pairs which concentrate 
confusion. We experimented with Starcraft 2 and report our first results. 


1 Introduction 

In most of online competitive games, players need an “avatar” (an online iden¬ 
tity) to log in the game network. Nothing forbids a player to have several avatars 
and actually, it is a very common practice for cyberathletes. Players generally 
have one official avatar for official tournaments, and several others to conceal 
their game tactics without being recognized by other players they may meet 
online: global rankings and leagues are public just as in chess and tennis, while 
game logs are available and prone to analysis by means of visualization and ma¬ 
chine learning just as in standard sport analytics. Accordingly, we are facing a 
set of players, generating behavioural data, in an unknown one-to-many relation¬ 
ship with avatars (handling many-to-many relationships is left to future work). 
In this context, the avatar aliases identification problem aims at discovering the 
group of avatars belonging to the same player. Solving this problem is motivated 
by the growing need of e-sport structures to study the games and strategies of 
the opponents (match preparation), and the security challenges of game editors 
(detecting avatar usurpers). 

Yan et al. showed that a classifier can be trained to predict with high accu¬ 
racy the avatars involved in a game play of Starcraft 2 J5]- Nevertheless, they 
purposely considered datasets without players having several avatars (what we 
call avatar aliases): in presence of such avatar aliases , the prediction accuracy 
drastically degrades, since prediction models fail at differentiating two avatars 
of the same player. We extend this work and answer the avatar aliases identifi¬ 
cation problem: it relies on mining the confusion matrix yielded by a supervised 
classifier using Formal Concept Analysis [T], and exploits the confusion a clas¬ 
sifier has in presence of avatar aliases when they belong to the same player. 
Experimental evaluation shows promising results. 

* This research has been partially funded by the French National Project FUI AAP 
14 Tracaverre 2012-2016. 



2 Basic notations and general intuition 


Let A be a set of avatars and T be a set of traces such as for a given avatar 
a £ A, the set T a C T is the set of all traces generated by a. Consider a classifier 
p where labels are the avatars to predict. A classifier is a function p : T —» A 
that assigns the avatar p(t) £ A to a given trace t £ T. Let n = A be the 
number of avatars in A , from any classifier p, one can derive a confusion matrix 
C'^xn = (cjj) where Cij = |{f £ T ai s.t. p{t) = aj}\. Each row and column of 
C p correspond to an avatar, while the value Cij is the number of traces of avatar 
a,i that are classified by p as of avatar aj. The normalized confusion matrix is 
given by C p = [cjj/|T ai |] where (7C = 1 for any i £ [1, |A|] means all the traces 
of avatar ai are correctly classified by p. 

Our goal is to discover the group of avatars 
that belong to the same player. Our intuition is 
that a classifier will hardly differentiate these avatar 
aliases, hence the confusion matrix values should be 
high and concentrated around them. This is exem¬ 
plified in Figure [lj avatars {01,02} are candidates 
to belong to the same player, {04,05} shall belong 
to another player, while 03 stays as singleton with a diagonal high value. A 
reasonable clustering of avatars would be given by {{ai, 02}, {03}, {04, 05}}. 

More formally, given a normalized confusion matrix C p , we would like to find 
pairs of avatars a iy aj £ U such that C A ~ ~ C p t ~ (5?- and C P j + + 

C p % + C p - ~ 2. These conditions come from the fact that, if at,aj correspond to 
the same player, traces in T ai have the same probability of being classified as 
a,; or aj (the same for traces in T aj ). Furthermore, for a trace of avatar ai, it is 
required that the probability of classification is spread between a* and aj only, 
meaning that C P j + CA ~ 1 (similarly for aj). 

3 Method 

Our method firstly extracts fuzzy concepts from the confusion matrix, scores 
and post-processes them to generate avatar pairs, candidates to be aliases. 

Fuzzy concepts in a confusion matrix. Let us define the fuzzy set of mem¬ 
bership degrees L A where L = [0,1], such as the mapping function S : A —> L A 
assigns membership values for the avatar ai in the fuzzy set L A based on the 
normalized confusion matrix. Simply, this is a mapping that assigns to ai its 
corresponding row in C p which we denote C' p . We model a confusion matrix 
C p as a pattern structure {A, (L A ,n),<5) 12]. The operator FI is a meet operator 
in a semi-lattice (idempotent, commutative and associative), and is defined as 
follows, given two avatars a,;, a 3 £ A: 

5{ai) n 5{aj) = {min{C p ik ,C p jk )), k £ [1, \A\] 

S(a,i) C S(aj) <£=> 6(ai) n 1 5(dj) = S(ai) 
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Fig. 1 . Confusion matrix 













Example. The Figure [l] illustrates a confusion matrix obtained from a classifier 
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Actually, n corresponds to the fuzzy set intersection and (. L A , C) is a partial 
order over the elements of L A which can be represented as a semi-lattice. The 
pattern structure ( A , (L A , n), 5) is provided with two derivation operators, form¬ 
ing a Galois connection (2.. Formally, we have, for a subset of avatars A C A and 
a fuzzy set d G L A such as: A° = r"| a eA < K a ) an d d D = {a G A | d C 5(a)}. 
The pair (A.d) is a pattern concept iff A° = d and d° = A. Pattern con¬ 
cepts are ordered by extent inclusion such that for (Ai, di) and (A 2 , d 2 ) we have: 
(Ai,di) < (A 2 ,d 2 ) Ai C A 2 (or di □ d 2 ). A pattern concept (A,d) contains 

a fuzzy set d which can be represented as a vector d = (dP) with length |A| 
where each value d J is the minimum for all rows i in column j of matrix C p s.t. 
ai G A. 


Computing and scoring concepts. From the confusion matrix we compute all 
possible pattern concepts using the addlntent algorithm [4). Pattern concepts 
are then ranked according to a score and converted into a list of pairs. For 
example, if a pattern concept extent contains three avatars ai, a 2 and 03, we 
convert this concept into pairs (ai, a 2 ), (ai, 03) and (a 2 , as). The scoring function 
s : L a —>• [0,1] is given as follows: for a pattern d, s(d) = A^dA 

Example. In Figure [lj we have: s({ai,a 2 }°) = 0.8, s({a4,a5}°) = 0.75 and 
s({ai,(12,04}°) = 0.05. 

It is clear that the function s is decreasing w.r.t. the order of pattern concepts, 
i.e. (Ai, di) < (A 2 ,d 2 ) => s(di) < s(d 2 ). Thus, pattern concepts can be mined 
up to a given score threshold analogously as formal concepts can be mined up 
to a given minimal support. We can appreciate that the higher the score of a 
given pattern, the more confused is the classification of traces of avatars a G A 
by p in C p and thus, they become candidates for merging. This property directly 
follows from the choice of our similarity operator n as a fuzzy set intersection, 
which behaves as a pessimistic operator (returning minimum values). 

Ranking avatar aliases. Consider the clustering condition previously formal¬ 
ized as Cfj ~ C£ ~ ~ C-C and + C^ + CG ~ 2. Consider that the 

pair of avatars ( ai,aj ) respects these conditions. It is easy to see that ( ai,aj) 
will necessarily be a candidate pair highly ranked from the previous step. 

% * ce. c miniC’.C 1 ;,) and min(C g ,Cft 

=> min{C^,C^) + mm(C'f i , C'j’J ~ 1 

Thus, the set of avatar clusters we are looking for are contained within the set 
of candidate pairs and moreover, they are highly ranked. In order to remove 
pairs from the list of candidates that do not hold the avatar cluster definition, 
we propose a cosine similarity measure between a couple of vectors calculated 
for each avatar as follows. Let (a,,a 4 ) be a candidate pair, the cluster score is 
defined as: cluster _score(ai, a j) = cosine^C^, C P P) , (C P j,C p f)). 
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The cluster score establishes a measure of how close is a candi¬ 
date pair from being an avatar cluster. The logic comes from the 
following scenario. Consider that the traces of avatar were all 
correctly classified meaning that C}) = 1 and that the traces of avatar a u 
all incorrectly classified as a i: meaning that CA = 1, thus we have the section 
of the normalized confusion matrix illustrated on the right hand side. We can 
observe that the pair ( ai,aj ) will be contained in the set of candidate pairs and 
will be highly ranked, even though it is not an avatar cluster since it violates 
the first condition. The cluster score for this particular case can be calculated 
as: clusterscore(ai, ay) = cosine{( 1, 0), (0,1)) = 0, meaning that this candidate 
pair is not an avatar cluster. Notice that for the pair of avatars such that an = 1 
and ajj = 1, the cluster score is 1 (cosine between parallel vectors) while the 
pair is not an avatar cluster. However this pair would have a score s equal to 0 
and would be at the bottom of the ranked candidate pairs. A third kind of pair 
occurs when the traces of Gq and ay are all incorrectly classified as a third avatar 
a*,. In such a case, the cluster score is 0. The post processing step is executed 
as follows. Given a ranked list of candidate pairs yielded from the previous step, 
each pair is evaluated using the cluster score. Given an arbitrary threshold A, if 
the cluster score of the candidate pair is below this threshold, then it is rejected. 
Candidate pairs are re-ranked into a final list of avatar clusters. 


4 Experiments 

Data collections and objectives. We constructed two collections of Starcraft 
2 replays to test our method. A replay contains all data necessary for the game 
engine to replay the game. Replays are shared on dedicated website^ and can 
be parsed to extract relevant feature^ The first collection has been chosen 
for studying the accuracy of classifiers to recognize avatars from their traces: we 
have selected 955 professionals games of 171 unique players which cannot contain 
avatar aliasct]^] The second collection, which have possible avatar aliases, is built 
with all replays available on SpawningTool.com in July 2014, for a total 10,108 
one-versus-one games and 3,805 players. This collection corresponds to a real 
world situation, and is used for evaluating our avatar alias resolution approach. 

Classifying avatars. Our method analyses the confusion matrix of a given 
classifier p. Good features, as well as a prediction method, should first be chosen. 
As features, we use the hotkey usage count [5] during the r first seconds of the 
game: there are 30 of such features ({0, ...,9} x {assign, remove, select}). We 
also consider the faction of the player, the game outcome ( winner or loser ) 
and actions per minutes in average (APM). We generated several datasets given 
the r parameter, and introduced also a minimum number 6 of games an avatar 
should have to be considered in the dataset. Each dataset is classified using the 

1 http://wiki.teamliquid.net/starcraft2/Replay_Websites 

2 http://sc2reader.readthedocs.org/ 

3 http: //wcs.battle.net/sc2/en/articles/wcs-2014-season-2-replays 








Weka machine learning software [^] and evaluated using 10-fold cross validation 
from which we obtain a confusion matrix. We chose four different classifiers, 
namely K Nearest Neighbours (knn), Naive Bayes (nbayes), J48 decision tree 
(j48) and Sequential Minimization Optimization (smo). Parameters for each of 
the classifier were left as default. Figure [2] shows the ROC area and the precision 
obtained for 92 datasets created for Collection 1. The parameter r ranged over 
23 values in an exponential scale, initially from 10 to 90 seconds then from 100 
to 900 and finally from 1000 to 5000 seconds (the longest game in this collection 
has around 5300 seconds) and thus, the x axis of each figure is in logarithmic 
scale. For each measure, four figures corresponding to four different settings of 9 
are presented. Each line corresponds to a different classifier. The figures present 
an empirical evaluation that the initial assumption, that avatars are very easily 
recognizable based in the signatures left in the traces they generate while playing, 
is true. For each different setting, ROC area is always around 100% showing the 
robustness of the approach under different parametrizations. Precision is always 
maintained over 60%, achieving its minimal value for the SMO classifier with 
9 = 5 and r > 1000. Actually, this also supports the following assumptions. 
Firstly, it is hard to recognize users that have played a few games, meaning that 
the larger the value of the 9 threshold, the more discriminative power has the 
classifier. Secondly, users are recognizable in the first few minutes of the game. 
The precision curves show a slight concave behaviour hinting a maximum of the 
precision w.r.t. the time cut used for traces. Users can be efficiently discovered by 
their hotkeys binding settings. As the game progresses, traces may differ given 
that the number of options in the game greatly increase and vary in execution 
regarding different opponents. 
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Fig. 2. Classification results for Collection 1: ROC area under the curve (AUC) and precision 
distribution on 23 points of r for four 6 values. 


4 http://www.cs.waikato.ac.nz/ml/weka/ 





































































Main method evaluation strategy. As we do not have information about the 
users behind the avatars, it is not possible to evaluate the avatar pairs candidates 
using a “ground truth”. Hence we performed an evaluation of our approach 
using three different strategies. First consider that an avatar of Starcraft 2 is 
given by its Battle.net account URL , made of a server name (Europe, America, 
etc.), a unique identifier, and an avatar name. We use the whole URL as avatar 
class labels in our classifiers. Note now that players have several accounts, on 
different servers, that may share the name. Players can also change the name of 
their avatar: it does not affect the ID and server that identify their account. As 
our method returns an ordered list of pairs candidates merging, we consider the 
following indicators, for each pair. 

— Avatar names. Two avatars may have the same name but different battle 
net id. It is weak indicator as it can be a common name (e.g. Batman). 

— Battle.net account unique ID. Two avatars may have two different names 
but the same unique identifier. This is a strong indicator. 

— Surrogates. We create surrogate avatars a 1 , a 2 from an avatar a £ A by gen¬ 
erating a partition in two different subsets of traces for each avatars. Our 
goal is to retrieve that a 1 and a 2 are avatar aliases. For splitting traces of 
an avatar into surrogates, we introduce a parameter /3 as a balance between 
the traces distributed over the surrogate avatars (/3 = 0.5 yields that both 
surrogate avatars will have half associated traces). We introduce others pa¬ 
rameters: 7 is the proportion of avatars who are converted in surrogates 
aliases. We assume that professionals play a lot of games then we select the 
avatar which have played more than 8 games. As we have observed, it is 
not necessary to analyse the entire replay to discriminate an avatar then we 
select the first r either actions or seconds. 


To evaluate our approach we will mea¬ 
sure the precision, recall and f-measure 
of the first 100 ranked avatar clusters. 
Given the ranking r, we consider TP,FP 
and FN stand for true positives, false 
positives and false negatives respectively. 
A pair is a false positive when we do 
not have enough information to consider 
them as true positives, meaning that their 
avatar names do not match, their URL is 
different and they are not part of our own 
set of surrogate avatars. They are in fact 
the kind of pairs we are looking for. As 



Fig. 3. Candidate pairs ranking. 


an example, the Figure [3] shows the initial candidate pairs extracted from a 
confusion matrix generated by a Sequential Minimization Optimization (SMO) 
classification with 7 = 0.05, 9 = 5 and A = 0.9. Within the figure, a point 
represent a pair of avatars with a red circle if the avatars are surrogates, a green 
triangle if they have the same account, a yellow star if they have the same name 
and in the other cases a blue cross, annotated with the nick-names of the avatars. 









The only FP in this list is a couple of avatars that belong to the player known as 
aLiv^J We also report on other three measures, namely P@10 (precision in the 
first 10 elements of the ranking), mean average precision (MAP), the receiver 
operating characteristic (ROC) and the ROC area under the curve (AUC). 

Identifying multiple aliases. The goal of these experiments is to assess our 
approach for finding avatar aliases based of the evaluation introduced above. 
For generating datasets, we have selected three different r values, namely 30,60 
and 90 seconds. We have picked the same values for 6 as in the previous exper¬ 
iments. Surrogates were generated for the first 5, 10, 15 and 20 percent of the 
most active users in the dataset ( 7 ) and we have set the balance (3 = 0.5. For 
each of the previously selected classifiers, the confusion matrix was processed 
by the Seplrirot addlntent implementation to obtain a set of pattern concepts. 
Scoring and post processing were implemented in ad-hoc python scripts. Table[l] 
shows a summary for the evaluation results using the top 100 pairs of avatar 
clusters. Results indicate that our approach is very efficient at identifying sur¬ 
rogate avatars, particularly for KNN and the J48 classifiers achieving very high 
recall values. In the upper part of the table, while precision is low it is worth 
noticing that in the top 100, there are only 41 surrogates meaning that the max¬ 
imum achievable precision is 0.41. The classifier KNN is particularly good in 
this measure achieving an almost perfect value (0.4 of 0.41). All four classifiers 
achieve a very high precision in the first 10 results (P@10) while two of them get 
a perfect score. Indeed, one of the main characteristics of our approach is the 
good ranking it generates over the avatar pairs. This fact is confirmed by the 
good MAP and ROC area under the curve (AUC) values achieved by all four 
classifiers. Both these measures slightly degrade when including in the set of true 
positives URLs and names. This can be understood since not all avatars with 
the same name necessarily belong to the same user. Thus, pairs of avatars with 
the same name will be more evenly distributed over the ranking or can even be 
found at the bottom indicating that they do not belong to the same user. This 
fact is reflected in the gap between the high growth of precision and low degra¬ 
dation of recall, i.e. avatars with the same name are distributed between the 
pairs retrieved and those that were not. As we have discussed, avatars with the 
same URL necessarily belong to the same user. Hence, we would have expected 
that in the first 10 pairs retrieved we could find an even distribution of surro¬ 
gates and URLs. Instead, for all classifiers, P@10 is more than 80% surrogates 
(while the rest is always URLs - P@10 in the medium part of the table). Table[2] 
shows a summary of results when looking for just surrogates while varying the 
balance in the distribution of traces between them. We can clearly observe that 
the performance of the approach quickly degrades as more imbalanced gets the 
distribution (the higher the /3 value). Actually, for some classifiers it is not pos¬ 
sible to obtain a single good result, even when we have lowered the A threshold 
to 0.8. As URLs are not necessarily balanced, classifiers tend to predict the label 
of a trace belonging to an avatar with less traces to one with more traces. Issues 

' http://wiki.teamliquid.net/starcraft2/ALive 

6 https://code.google.eom/p/sephirot/ 



related to learning from imbalanced datasets are reviewed in [3] and need to be 
considered when selecting a proper classifier for our particular application. 
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Table 1. Main results 
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Table 2. Varying surrogate balance (/3) 


5 Conclusion 

We introduced the problem of avatar aliases identification when there exists no 
mapping between individuals and their avatars. This is an important problem 
for game editors, but also for e-sport structures. Our method relies on the fact 
that behavioural data hide individual characteristic patterns, which allows mak¬ 
ing predictive approaches very accurately. Nevertheless, this good performance 
quickly degrades when data hides avatar aliases, which is why we based our anal¬ 
ysis on confusion matrices. As future work, we plan to study other competitive 
games, and how biclustering could tackle the problem. We also believe that our 
approach can be used to solve other application problems, such as identifying 
users on different devices (smart-phones, tablet, computer, etc.) regarding the 
usage traces they left. 
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